-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set auto_provisioning_defaults.service_account #639
Set auto_provisioning_defaults.service_account #639
Conversation
Thanks for the PR! 🚀 |
I have verified that this fixes NAP on my cluster. I was also able to stand up a cluster with the default settings and saw that NAP was not enabled. As for the CI failure above, I'm not sure if I caused that or how. I get some similar errors when I run |
@dpetersen Could you rebase on master? We had some issues with tf0.13 release that has since been fixed on master. |
2f31c3e
to
4bed7e6
Compare
Doh, @bharathkkb I saw your email this weekend and didn't notice your update until I'd already rebased and force pushed. Looks like one job is passing, but linting is not. I am unable to see either of the jobs in Cloud Build to see what failed. I did manually try and replicate what |
4bed7e6
to
f5e85e0
Compare
I have rebased this with master and also added one additional overridable attribute for node auto-provisioning that came up in my testing. I noticed that my NAP (node auto-provisioning) nodes could not pull images from my private registry, despite running as a service account that had access. I realized that those nodes had different oAuth scopes to my manual node:
my other pool, set up with this module, had the module's default:
The |
@dpetersen If scopes are not provided, we should probably pick up whatever scopes are specified on the existing node pools (maybe the first node pool), which also default to |
That makes sense, I will try and figure it out. Unfortunately, I'm running into a few unrelated issues here. First, I think #676 is breaking my local |
Well, I learned something new today. The terraform-google-kubernetes-engine/autogen/main/variables.tf.tmpl Lines 211 to 229 in dffb047
There is a long-running, open Github issue on the topic, but if I add It probably doesn't make sense for oAuth scopes to be required to enable NAP. And this isn't the last NAP-related override that should probably be optional. At some point, surge upgrade settings for NAP will be supported in the provider, and that should certainly be optional. I think you could probably make the argument that So the scenario your described above is not possible with this configuration, you can't enable NAP and not specify your oAuth scopes. I suppose you could specify an empty list, but that would be confusing to use. I think I should probably make a new, separate variable, like |
Thanks for investigating and I had forgotten the issue around defaults for objects. In this case, I think we actually have a nice solution: simply reference the "all" value for |
That makes sense and works great for me. I have updated the PR and removed that variable. |
Ugh, I found one additional issue. I set up a minimal cluster to test one of my other PRs, and that cluster did not have NAP enabled. But the plan would continually say it wanted to change my cluster with:
It would apply cleanly, but then continue to have the same desired changes over and over. So I think maybe the API doesn't save your auto-provisioning defaults when autoscaling is disabled. That code for a conditional block (which I stole from a medium article) feels like a hack, but it does seem to work. |
This sets the Service Account that should be used by node VMs created by node auto-provisioning. This should cause the auto-provisioned nodes to have the same permissions as the nodes that are manually provisioned.
The defaults I included come from the scopes I observed in a cluster I stood up when no scopes were specified. I am assuming these are GKE defaults. This does not match the default scopes for normal node pools in this Terraform module, so it may not be the correct choice.
The API will not reject the defaults, but it also does not save them when auto-provisioning is disabled. Which means your plan will continually attempt to update the attributes, succeed, and then still think they need to be updated.
a3919cb
to
b83d8d8
Compare
Hey everybody, I'm back to pester you about this pull request. We have been running this for a couple of months, but we're upgrading to Terraform 0.14 so I have rebased this pull request on As far as I know, I have implemented the requested changes from your feedback. Ever since I first created this pull request, CI has been failing. Unfortunately, I don't have rights to view the failure to see what the issue is. Please let me know if there's anything I can do to get this in a release version of the module. Thanks so much, this module saves us a ton of time! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed response.
The failing test is:
expected: {"autoprovisioningNodePoolDefaults"=>{"oauthScopes"=>["https://www.googleapis.com/auth/logging.write"...inimum"=>"5", "resourceType"=>"cpu"}, {"maximum"=>"30", "minimum"=>"10", "resourceType"=>"memory"}]}
got: {"autoprovisioningNodePoolDefaults"=>{"oauthScopes"=>["https://www.googleapis.com/auth/cloud-platform...inimum"=>"5", "resourceType"=>"cpu"}, {"maximum"=>"30", "minimum"=>"10", "resourceType"=>"memory"}]}
It is configured here with this fixture.
I don't expect this to fix the tests, there is still the issue of the serviceAccount.
Oh, thank you. I wish it wasn't truncating the hash like that, but the scopes being different does make sense. I have just pushed up a commit that should fix that, but I don't expect the tests to pass. I'm trying to understand how these specs are run. I'm fairly certain that CI will still fail, but now about the node pool service account. The specs expect |
@dpetersen Looks like it succeeded after all. :) FYI for the future you can run tests locally to verify: https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/master/CONTRIBUTING.md#integration-testing |
Doh! I went looking for that in the README, I didn't notice the other file. Thanks again, really glad this is in master, and I look forward to the next release. I appreciate all your help with this, and your work on the modules in this ecosystem, they're working great for us. |
This sets the Service Account that should be used by node VMs created by
node auto-provisioning. This should cause the auto-provisioned nodes to
have the same permissions as the nodes that are manually provisioned.
This is to address #560. I will be testing this in my cluster that has NAP enabled. As far as I can tell, it's safe to set this value whether auto-provisioning is actually enabled or disabled, and that using
local.service_account
is the right default. That's the default for any manualnode_pool
that isn't specifically overridden.